feature extractor
6a42b45af2b72e6e5b5e3a6fe695809f-Supplemental-Datasets_and_Benchmarks.pdf
The model can easily distinguish A and B according to the background (i.e., the so-called geometric skews [26]), but not according to the features of the class instance itself. However, if there is another class C, which is also in black background. In this tri-classification task (distinguishing A,B, and C), an ideal model should focus on the feature of the instance itself but not the background. This is one of the difficulties: distribution bias on samples, that some beneficial features (e.g., background) may be good for the classification, but not good for understanding the class (in a compositional way). Another difficulty is entanglement of the labels. We provide the labels in a relative way that the label of A is '0' and of B is '1', but not their true textual meanings (e.g., white paper and green leaves). The concept information is entangled and embedded into the label, thus, it is hard for the model to tell which visual features capture the corresponding concepts (i.e., white refers to the color feature and paper refers to the texture feature). We hope our understanding of this issue can inspire researchers to focus more on compositionality and design excellent continual learners.
af2bb2b2280d36f8842e440b4e275152-Supplemental-Conference.pdf
A.1 Proof of Theorem 1 In this proof, we adopt a simplified version of our message-passing function that ignores the skipconnection: The HGNN trained in the experimental results shown in Figure 2 also does not use skip-connections and hence represents a theoretically-exact KTN component. In the real experiments, we use (1) skip-connections, exploiting their usual benefits (12), and (2) the trainable version of KTN. Without loss of generality, we prove the result for the case where R = {(s,t): s,t T }, meaning the type of an edge is identified with the (ordered) types of the neighbor nodes. In other words, there is only one edge modality possible, such as a social networks with multiple node types (e.g. "friendship" and "message"), the result is extended trivially (through with more algebraically-dense forms of ats and qts). The output of Aggregate is a concatenation of edge-type-specific aggregations (see Equation 3).
Adversarial Feature Desensitization
Neural networks are known to be vulnerable to adversarial attacks - slight but carefully constructed perturbations of the inputs which can drastically impair the network's performance. Many defense methods have been proposed for improving robustness of deep networks by training them on adversarially perturbed inputs. However, these models often remain vulnerable to new types of attacks not seen during training, and even to slightly stronger versions of previously seen attacks. In this work, we propose a novel approach to adversarial robustness, which builds upon the insights from the domain adaptation field. Our method, called Adversarial Feature Desensitization (AFD), aims at learning features that are invariant towards adversarial perturbations of the inputs. This is achieved through a game where we learn features that are both predictive and robust (insensitive to adversarial attacks), i.e. cannot be used to discriminate between natural and adversarial data. Empirical results on several benchmarks demonstrate the effectiveness of the proposed approach against a wide range of attack types and attack strengths. Our code is available at https://github.com/BashivanLab/afd.
ATraining Regime
A.1 Implementation of the GPs We use the GPyTorch4 package for the computations of GPs and their kernels. The NN linear kernel is implemented in all experiments as a 1-layer MLP with ReLU activations and hidden dimension 16. For the Spectral Mixture Kernel, we use 4 mixtures. A.2 Sines Dataset For the first experiments on sines functions, we use the dataset from [9]. For each task, the input points x are sampled from the range [ 5,5], and the target values y are obtained by applying y = Asin(x ')+, where the amplitude A and phase ' are drawn uniformly at random from ranges [0.1,5] and [0, ], respectively.
Rebuttal for " Revisiting the Evaluation of Image Synthesis with GANs " Anonymous Author(s) Affiliation Address email
Our presentation is organized for following reasons: In Section 2.3, we present the228 details of generative models, evaluated datasets, and analysis approaches (including our visualization229 tool, histogram matching attack, and human evaluation). They are independent of each other, thus230 we discuss them in parallel in the main paper. In Section 3.1, we investigate the feature extractors231 by first identifying their attention on visual semantics, followed by investigating their robustness to232 the histogram matching attack. Finally, we filter extractors that define similar representation spaces.233 These studies are gradually deepening, thus they are organized in a progressive manner.
Revisiting the Evaluation of Image Synthesis with GANs
A good metric, which promises a reliable comparison between solutions, is essential for any well-defined task. Unlike most vision tasks that have per-sample groundtruth, image synthesis tasks target generating unseen data and hence are usually evaluated through a distributional distance between one set of real samples and another set of generated samples. This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models. In particular, we make indepth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set. Extensive experiments conducted on multiple datasets and settings reveal several important findings. Firstly, a group of models that include both CNN-based and ViT-based architectures serve as reliable and robust feature extractors for measurement evaluation. Secondly, Centered Kernel Alignment (CKA) provides a better comparison across various extractors and hierarchical layers in one model. Finally, CKA is more sampleefficient and enjoys better agreement with human judgment in characterizing the similarity between two internal data correlations. These findings contribute to the development of a new measurement system, which enables a consistent and reliable re-evaluation of current state-of-the-art generative models. 1
Object centric Cyclic Walks between Parts and Whole
Learning object-centric representations from complex natural environments enables both humans and machines with reasoning abilities from low-level perceptual features. To capture compositional entities of the scene, we proposed cyclic walks between perceptual features extracted from vision transformers and object entities. First, a slot-attention module interfaces with these perceptual features and produces a finite set of slot representations. These slots can bind to any object entities in the scene via inter-slot competitions for attention. Next, we establish entity-feature correspondence with cyclic walks along high transition probability based on the pairwise similarity between perceptual features (aka "parts") and slot-binded object representations (aka "whole").